Trace Cache Redundancy: Red & Blue Traces
نویسندگان
چکیده
The objective of this paper is to improve the use of the hardware resources of the trace cache mechanism, reducing the implementation cost with no performance degradation. We achieve that by eliminating the replication of traces between the instruction cache and the trace cache. As we show, the trace cache mechanism is generating a high degree of redundancy between the traces stored in the trace cache and those built by the compiler, already present in the instruction cache. Furthermore, code reordering techniques like the software trace cache arrange the basic blocks in a program so that the fall-through path is the most common, effectively increasing this trace redundancy. We propose selective trace storage to avoid trace redundancy between the trace cache and the instruction cache. A simple modification of the fill unit allows the trace cache to store only those traces containing taken branches, which can not be obtained in a single cycle from the instruction cache. Our results show that selective trace storage and the software trace cache used on a 32 entry trace cache (2KB) perform as well as a 2048 entry trace cache (128KB) without the enhancements. This shows that the cooperation between hardware and software is crucial to improve the performance and reduce the requirements of hardware mechanisms in the fetch engine.
منابع مشابه
Exposing Instruction Level Parallelism in the Presence of Loops
In this thesis we explore how to utilize a loop cache to relieve the unnecessary pressure placed on the trace cache by loops. Due to the high temporal locality of loops, loops should be cached. We have observed that when loops contain control flow instructions in their bodies it is better to collect traces on a dedicated loop cache instead of using trace cache space. The traces of instructions ...
متن کاملPSnAP: Accurate Synthetic Address Streams through Memory Profiles
Memory address traces are an important information source; they drive memory simulations for performance modeling, systems design and application tuning. For long running applications, the direct use of an address trace is complicated by its size. Previous attempts to reduce address trace size incurred a substantial penalty with respect to trace accuracy. We propose a novel method of memory pro...
متن کاملDynamic Profiling and Trace Cache Generation for a Java Virtual Machine
Dynamic program optimization is becoming increasingly important for achieving good runtime performance. One of the key issues in such systems is how it selects which code to optimize. One approach is to dynamically detect traces, long sequences of instructions which are likely to execute to completion. Such traces can be stored in a trace cache and dispatched one trace at a time (rather than on...
متن کاملTSpec: A Notation for Describing Memory Reference Traces
Interpreting reference patterns in the output of a processor is complicated by the lack of a succinct notation for humans to use when communicating about them. Since an actual trace is simply an incredibly long list of numbers, it is difficult to see the underlying patterns inherent in it. The source code, while simpler to look at, does not include the effects of compiler optimizations such as ...
متن کاملTechniques for Cache and Memory Simulation Using Address Reference Traces
Simulation using address reference traces is one of the primary methods for the performance evaluation of the memory hierarchy of computer systems. In this paper we survey the techniques used in such a simulation. In both the uniprocessor and shared-memory multiprocessor cases, the issues can be divided into trace collection, trace storage, and trace usage. Trace collection can employ several h...
متن کامل